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Abstract 

We address the problem of general function release under dif¬ 
ferential privacy, by developing a functional mechanism that 
applies under the weak assumptions of oracle access to target 
function evaluation and sensitivity. These conditions permit 
treatment of functions described explicitly or implicitly as 
algorithmic hlack boxes. We achieve this result by leveraging 
the iterated Bernstein operator for polynomial approximation 
of the target function, and polynomial coefficient perturbation. 
Under weak regularity conditions, we establish fast rates on 
utility measured by high-probability uniform approximation. 
We provide a lower bound on the utility achievable for any 
functional mechanism that is e-differentially private. The gen¬ 
erality of our mechanism is demonstrated by the analysis of 
a number of example learners, including naive Bayes, non- 
parametric estimators and regularized empirical risk minimiza¬ 
tion. Competitive rates are demonstrated for kernel density 
estimation; and e-differential privacy is achieved for a broader 
class of support vector machines than known previously. 

1 Introduction 

In recent years, differential privacy (Dwork et al. 2006) has 
emerged as a leading paradigm for privacy-preserving statis¬ 
tical analyses. It provides formal guarantees that aggregate 
statistics output by a randomized mechanism are not signifi¬ 
cantly influenced by the presence or absence of an individual 
datum. Where the Laplace mechanism (Dwork et al. 2006) is 
a de facto approach for converting vector-valued functions to 
differential privacy, in this paper we seek an equivalent ap¬ 
proach for privatizing function-valued mappings. We achieve 
our goal through the development of a novel Bernstein func¬ 
tional mechanism. Unlike existing mechanisms, ours applies 
to releasing explicitly and implicitly defined functions, and 
is characterized by a full theoretical analysis. 

Our setting is the release of functions that depend on 
privacy-sensitive training data, and that can be subsequently 
evaluated on arbitrary test points. This non-interactive setting 
matches a wide variety of learning tasks from naive Bayes 
classification, non-parametric methods (kernel density esti¬ 
mation and regression) where the function of train and test 
data is explicit, to generalized linear models, support vec¬ 
tor machines where the function is only implicitly defined 
by an iterative algorithm. Our generic mechanism is based 
on functional approximation by Bernstein basis polynomials. 


specifically via an iterated Bernstein operator. Privacy is guar¬ 
anteed by sanitizing the coefficients of approximation, which 
requires only function evaluation. It is the very limited oracle 
access required by our mechanism—to non-private function 
evaluation and sensitivity—that grants it broad applicability 
akin to the Laplace mechanism. 

The Bernstein polynomials central to our mechanism are 
used in the Stone-Weierstrass theorem to uniformly approxi¬ 
mate any continuous function on a closed interval. Moreover, 
the Bernstein operator offers several advantages such as data- 
independent bounds, no requirement of access to target func¬ 
tion derivatives, and yields approximations that are pointwise 
convex combinations of the function evaluations on a cover. 
As a result, applying privacy-preserving perturbations to the 
approximation’s coefficients permits us to control utility and 
achieve fast convergence rates. 

In addition to being analyzed in full, the Bernstein mech¬ 
anism is easy to use. We demonstrate this with a variety of 
example analyses of the mechanism applied to learners. Fi¬ 
nally, we provide a lower bound that fundamentally limits 
utility under private function release, partly resolving a ques¬ 
tion posed by Hall, Rinaldo, and Wasserman (2013). This 
matches (up to logarithmic factors) our upper bound in the 
linear case. 


Related Work. Polynomial approximation has proven use¬ 
ful in differential privacy outside function release (Thaler, 
Ullman, and Vadhan 2012; Chandrasekaran et al. 2014). Few 
previous attempts have been made towards private function 
release. Hall, Rinaldo, and Wasserman (2013) add Gaussian 
process noise which only yields a weaker form of privacy, 
namely (e, 5)-differential privacy, and does not admit general 
utility rates. Zhang et al. (2012) introduce a functional mech¬ 
anism for the more specific task of perturbing the objective 
in private optimization, but they assume separability in the 
training data and do not obtain rates on utility. 

Wang et al. (2013) propose a mechanism that releases a 
summary of data in a trigonometric basis, able to respond to 
queries that are smooth as in our setting, but are also required 
to be separable in the training dataset as assumed by Zhang et 
al. (2012). A natural application is kernel density estimation, 
which would achieve a rate of O (log(l//?)as 
does our approach. Private KDE has also been explored in var- 



ious other settings (Duchi, Wainwright, and Jordan 2013) and 
under weaker notions of utility (Hall, Rinaldo, and Wasser- 
man 2013). Zhang, Rubinstein, and Dimitrakakis (2016) ex¬ 
plore discrete naive Bayes under differential privacy, while 
we investigate parametric Gaussian and non-parametric KDE 
for class-conditional likelihoods. 

As an example of an implicitly defined function, we con¬ 
sider regularized empirical risk minimization such as lo¬ 
gistic regression, ridge regression, and the SVM. Previous 
mechanisms for private SVM release and ERM more gener¬ 
ally (Chaudhuri and Monteleoni 2008; Rubinstein et al. 2012; 
Chaudhuri, Monteleoni, and Sarwate 2011; Jain and Thakurta 
2014; 2013; Bassily, Smith, and Thakurta 2014) require finite¬ 
dimensional feature mapping or translation-invariant kernels. 
Hall, Rinaldo, and Wasserman (2013) consider more general 
mappings but provide (e, 5)-differential privacy. Our treat¬ 
ment of regularized ERM extends to kernels that may be 
translation-variant with infinite-dimensional mappings, while 
providing stronger privacy guarantees. 

2 Preliminaries 

Notation and Problem Setting. Throughout the paper, 
vectors are written in bold and the i-th component of a vector 
X is denoted by Xi. We consider X an arbitrary (possibly 
infinite) domain and T) € A" a database of n points in 
X. We refer to n as the size of the database T). For a pos¬ 
itive integer f, let y = [ 0 , 1 ]^ be a set of query points and 
F : A" X 3^ —> M the target function. Once the database V 
is fixed, we denote by Fp = F{'D, ■) the function parame¬ 
terized by T) that we aim to release. For example: T) might 
represent a training set—over X a product space of feature 
vectors and labels—with y representing test points from the 
same feature space; Fp would then be a classifier resulting 
from training on T). Section 6 presents examples for F. In 
Section 3, we show how to privately release the function Fp 
and we provide alternative error bounds depending on the 
regularity of F. 

Definition 1. Let h be a positive integer and T > 0. A 
function f: [0,1]^ —t M is {h, T)-smooth if it is C^([0,1]^) 
and its partial derivatives up to order h are all bounded by T. 

Definition 2. Let 0 < 7 < 1 and L > 0. A function 
f: [0,1]^ —> M is {'f, L)-Hdlder continuous if for every 
x,y G [0,1]^, \ f{x) - f{y)\ < L\\x - y||3^. When 7 = 1 , 
we refer to f as L-Lipschitz. 

Our goal is to develop a private release mechanism for the 
function Fxi in the non-interactive setting. A non-interactive 
mechanism takes a function F and a database T> as inputs 
and outputs a synopsis A which can be used to evaluate the 
function Fp on y without accessing the database T) further. 

Differential Privacy. To provide strong privacy guarantees 
on the release of Fp, we adopt the well-established notion of 
differential privacy. 

Definition 3 (Dwork et al. 2006). Let TZ be a (possibly infi¬ 
nite) set of responses. A mechanism A4 : X* —>■ TZ (meaning 
that, for every T> G X* = lJn>o AJ(I?) is an TZ-valued 
random variable) is said to provide (s, (5)-differential privacy 


for £ > 0 and 0 < 5 < 1 if for every n € N, for every pair 
(T), V) G A” X A" of databases differing in one entry only 
(henceforth denoted byV^ L)'), and for every measurable 
SCTZ,we have V[M{V) G S] < e^F[M(V') G S'] -f (5. If 
5 = D we simply say that M. provides e-differential privacy. 

By limiting the influence of data on the induced response 
distribution, a powerful adversary (with knowledge of all but 
one input datum, the mechanism up to random source, and 
unbounded computation) cannot effectively identify an un¬ 
known input datum from mechanism responses. The Laplace 
mechanism (Dwork et al. 2006) is a generic tool for differ¬ 
ential privacy: adding zero-mean Laplace noise* to a vector¬ 
valued function provides privacy if the noise is calibrated to 
the function’s sensitivity. 

Definition 4 (Dwork et al. 2006). The sensitivity of a func¬ 
tion f : A” —)• M'* is given by S(f) = supp„^p/ 11/(2^) ~ 
/(F') II 1 , where the supremum is taken over all V, T)' G A" 
that differ in one entry only. The sensitivity of a function 
F : A" X —)■ is defined as S(F) = sup^^gj; S(F{-,y)). 

Lemma 1 (Dwork et al. 2006). Let f : A” —> M'* be 
a non-private function of finite sensitivity, and let Z ^ 
Lap (S(/)/£)'*. Then, the random function /(F) = /(F) -|- 
Z provides e-differential privacy. 

Given a mechanism, we measure its accuracy as follows. 

Definition 5. Let F: A” x /i* —>■ M. A mechanism A4 is 
(a, /?)-accurate with respect to Fp if for any database F G 
A" and A — A4 (F), with probability at least 1 — /3 over the 
randomness of Ai, sup^^gj; ~ Fv{y)\ fi Oi. 

3 The Bernstein Mechanism 

Algorithm 1 introduces a differentially-private mechanism for 
releasing Fp : —> M, a family of (h, T)-smooth or ( 7 , L)- 
Holder continuous functions, parameterized by F G A”. 


Algorithm 1 The Bernstein mechanism 

Sanitization - Inputs: private dataset F G A"; sensitiv¬ 
ity S(F) and oracle access to target F: A” x /i” —)• M 
Parameters: cover size k, Bernstein order h positive 
integers; privacy budget £ > 0 
1 : P ^ {{Q,l/k,2/k, . .. A}Y > Lattice cover of 

2 : A ^ S(F)(k -f l)^/£ > Perturbation scale 

3: For each p = (pi,... ,pi) G P: 

4: Fvip) ^ F'u(p) Z, where Z Lap(A) 

5: Return: |Fp(p) | p G f| 


Evaluation - Inputs: query y G y', private response 


{Fp(p) I p g f| 


6 : ^ Compute basis 

7: Return: E1 =o (^, 


> See Definition 8 

, f) ULi 


* A Lap(A)-distributed real random variable Z has probability 
density proportional to exp(—|j/|/A). 




The mechanism makes use of the iterated Bernstein poly¬ 
nomial of Fx), which we introduce next (for a comprehensive 
survey refer to Lorentz 1953, Micchelli 1973). This approxi¬ 
mation consists of a linear combination of so-called Bernstein 
basis polynomials, whose coefficients are evaluations of tar¬ 
get Fx on a (lattice) cover P. 

We briefly introduce the univariate Bernstein basis polyno¬ 
mials and state some of their properties. 

Definition 6. Let k be a positive integer. The Bernstein basis 
polynomials of degree k are defined as by^kiy) = ~ 

forv = 0,...,k. 

Proposition 2 (Lorentz 1953). For every y G [0,1], any 
positive integer k and Q < v < k, we have P 0 and 

Y!l=oKk{y) = 1 - 

In order to introduce the iterated Bernstein polynomials, 
we first need to recall the Bernstein operator. 

Definition 7. Let f : [0,1] —> M and k be a positive integer. 
The Bernstein polynomial of f of degree k is defined as 

Bkif; y) = J2t=o f i^/k) bu,k{y)- 

The Bernstein operator Bk maps a function /, defined 
on [0,1], to Bkf, where the function Bkf evaluated at y is 
Bkif; y). Note that the Bernstein operator is linear and if 
f{y) G [ai, 02] for every y G [0,1], then from Proposition 2 
it follows that Bk{f; y) G [ui, 02] for every positive integer 
k and y G [0,1]. Moreover, it is not hard to show that any 
linear function is a fixed point for Bk. For completeness, we 
provide a short proof in Appendix A. 

Definition 8 (Micchelli 1973). Let h be a positive integer. 
The iterated Bernstein operator of order h is defined as the 
sequence of linear operators B^^ = I — {I — Bk)^ = 
^i=i (i )where I = B^ denotes the identity 
operator and is defined inductively as B]. = Bk o B]r^ 
for i > 1. The iterated Bernstein polynomial of order h can 
then be computed as: 

where b^^liy) = {’l) Bl-\b^y, y) . 

We observe that B^}^ = Bk. Although the bases b^^l are 
not always positive for /i > 2, we still have = 

1 for every y € [0,1]. The iterated Bernstein polynomial of a 
multivariate function /: [0,1]^ —> M is analogously defined. 

Definition 9. Assume f : [0,1]^ -G M and let ki,... ,ki,h 
be positive integers. The (multivariate) iterated Bernstein 
polynomial of f (of order h) is defined as 




i=i 




For ease of exposition, we fix user-selected k G N such 
that ki = ... = ki = k. The Bernstein mechanism perturbs 
the evaluation of Fx on a lattice cover P of y = [0,1]^ 
parameterized by k. 


4 Analysis of Mechanism Privacy and Utility 

In the following result, we assume £ to be an arbitrary but 
fixed consfanf wifh y = [0,1]^. We underline fhaf fhis is 
a common assumption in fhe differential privacy liferafure, 
especially when dealing with Euclidean spaces (Blum, Ligett, 
and Roth 2008; Dwork and Lei 2009; Wasserman and Zhou 
2010; Lei 2011; Wang et al. 2013). 

Theorem 3 (Main Theorem). Let € N+, 0 < 7 < 1, 
L > 0 and T > 0 be constants. Let X be an arbitrary 
space and y = [0,1]^. Let furthermore F: A'" x y’ —)■ K 
with S(F) = 0(1). For £ > 0, the Bernstein mechanism A4 
provides e-differential privacy. Moreover, for 0 < /3 < 1 
the mechanism Ai is (a, j3)-accurate with error scaling as 
follows, where hidden constants depend on t, L, 7, T, h. 

(i) If Fx is {2h,T)-smooth for every T) G X", there 
exists k = k{S{F),e, l3,£,h,T) such that a = 

o(^bg(l/«)™; 

(ii) If Fx is (y, L)-Hdlder continuous for every T) G Af", 
there exists k = k(S(F),e, f3,£,y, L) such that a = 

O l0g(l//3)^ 

(Hi) IfFx is linear for every T) G A'", there exists a constant 
k such that a = O log(l//3)^- 

Moreover, ifl/S(F) < poly(n), then the running-time of 
the mechanism and the running-time per evaluation are both 
polynomial in n and 1 /e. 

4.1 Proof of the Main Theorem^ 

To prove privacy we note that only the coefficients of the 
Bernstein polynomial of Fx are sensitive and need to be 
protected. In order to provide £-differential privacy, these 
coefficients—evaluations of target Fx on a cover—are per¬ 
turbed by means of Lemma 1. In this way, we can release the 
sanitized coefficients and use them for unlimited, efficient 
evaluation of the approximation of Fx over y, without fur¬ 
ther access to the data T>. To establish utility, we separately 
analyze error due to the polynomial approximation of Fx 
and error due to perturbation. 

In order to analyze the accuracy of our mechanism, we 

denote by B^^\Fx]y) the iterated Bernstein polynomial 
of order h constructed using the coefficients output by the 
mechanism A4. The error a introduced by the mechanism 
can be expressed as follows: 


a = max 
y&[0HY 


Fx(y)-Bi^\Fx;y) 


< max 
3/e[o.i]'^ 


B^^\Fx-,y) - B^^\Fx-,y) 


-b max 


Fx{y) 


Bt\Fx-,y) 


( 1 ) 

( 2 ) 


^For sake of clarity, in Appendix B we provide a self-contained 
proof of Theorem 3 for 1=1. Although it is not a prerequisite to 
the general result, it reflects the building blocks used in this section. 



For every y e [0,1]^, the first summand in Equation (2) 
consists of the absolute value of an affine combination of 
independent Laplace-distributed random variables. 

Proposition 4. Let F = {i/ € | 0 < < kfor 1 < 

j < ^}- For every u = {vi,vi) € T let Zi, Lap(A). 
Moreover, let t > 0 and constant Chy depend only on h, 1. 
Then: 



t k 1 

■ 

max 

ye[o,i]^ 

j — l I'j—O 1 

> T 


The proof of Proposition 4 follows from a result 
of Proschan (1965) on the concentration of convex combi¬ 
nations of random variables drawn i.i.d. from a log-concave 
symmetric distribution. For completeness, we give the full 
proof in Appendix D. Proposition 4 implies that with prob¬ 
ability at least 1 — /3 the first summand in Equation (2) is 
bounded by O (^S{F)k^ log(l//3)/e). In order to bound the 
second summand we make use of the following (unidimen¬ 
sional) convergence rates. 


Theorem 5 (Micchelli 1973). Let h be a positive inte¬ 
ger and T > 0. If f: [0,1] —> ^ is a {2h,T)-smooth 
function, then, for all positive integers k and y G [0,1], 

f{y) — B^\f;y) < TDhk~^, where is a constant 
independent ofk, f and y € [0,1]. 

Theorem 6 (Kac 1938; Mathe 1999). Let 0 < 7 < 1 and 
L > 0. If f : [0,1] —)■ M is a (7, L)-Hdlder continuous func¬ 
tion, then f{y) — B'j}\f;y) < L (Ak) for all positive 
integers k and y G [0,1]. 


By induction, it is possible to show that the approximation 
error of the multivariate iterated Bernstein polynomial can 
be bounded by 0{£g{k)) = 0{g(k)), if the error of the 
corresponding univariate polynomial is bounded by g{k). For 
completeness, we provide a proof in Appendix E. 

All in all, the error a introduced by the mechanism can 
thus be bounded by 


a = 0 { g(k) + log(l//3) 


(3) 


Since g(k) is a decreasing function in k and the second 
summand in Equation (3) is an increasing function in k, the 
optimal value for k (up to a constant factor) is achieved when 
k satisfies 

g{k) = log(l//3) . (4) 

Solving Equation (4) with the bound for g{k) provided in 
Theorem 5 yields 


and substituting the thus obtained value of k into Equation (3) 
yields the first statement of Theorem 3. Similarly, using the 
bound for g(k) provided in Theorem 6 we get the result for 
Holder continuous functions. The bound for linear functions 


follows from the fact that the approximation error is zero for 
h = 1 and k = 1, since linear functions are fixed points of 
b[^K Finally, the analysis of the running time follows from 
observing that, for the optimal cover size k we computed, k^ 
is always upper bounded by e/{S{F) log(l//3)) and thus by 
poly(n). 

4.2 Discussion 

Comparison to Baseline. Algorithm 1 is based on a rela¬ 
tively simple approach: it evaluates the target function on a 
lattice cover, adding Laplace noise for privacy. One might 
be tempted to approximate the input function by rounding 
a query point y to the nearest lattice point p and releasing 
the corresponding noisy evaluation Fx>{p). Although it is 
straightforward to prove that, for (7, L)-H61der continuous 
functions, such a piecewise constant approximation achieves 
error 0{l/k^), this upper bound is essentially tight, as it can 
be shown by considering the approximation error it achieves 
for linear functions. Therefore, this method has two main 
disadvantages: the output function is not even continuous (al¬ 
though we always consider continuous input functions) and 
for highly smooth input functions it cannot achieve the fast 
convergence rates of the Bernstein mechanism. In Section 6, 
we offer further examples supporting this argument. 

(e, 5)-Differential Privacy. We note that our analysis can 
be easily extended to the relaxed notion of approximate differ¬ 
ential privacy using advanced composition theorems (see for 
example Dwork and Roth 2014) instead of sequential compo¬ 
sition (Dwork et al. 2006). Specifically, it suffices to choose 
the perturbation scale \s = 2S{F)y/2{k -f If \og{l/5)/e. 

Theorem 7. Let 0 < 5 < 1. Under the same assumptions of 
Theorem 3, the Bernstein mechanism M. {with perturbation 
scale Xs) provides {e,S)-differential privacy and is {a, j3)- 
accurate with error scaling as follows. 

(i) If Fx! is {2h,T)-smooth for every T) € there 
exists k = k{S{F),£,5, j3,£,h,T) such that a = 

2h 

O log(l//3)\/log(l/(5)) ; and 

(ii) If Fx> is {•y , L)-Hdlder continuous for every T) € A’", 
there exists k = k{S{F), e, S, j5, £, 7, L) such that a = 

O log(l//3)A/log(l/())) 

Even though this relaxation allows for improved accuracy, 
in this work we explore a different point on the privacy-utility 
Pareto front and focus our attention on e-differential privacy, 
since there is generally a significant motivation for achieving 
stronger privacy guarantees. Moreover, to the best of our 
knowledge, it is unknown whether previous solutions (Hall, 
Rinaldo, and Wasserman 2013) even apply to this framework. 

5 Lower Bound 

In this section we present a lower bound on the error that any 
e-differentially private mechanism approximating a function 
F : Af” X 3^ —)• M must introduce. 



Utility vs Privacy for KDE 


Theorem 8. For e > 0, there exists a function F : A"" x 
3^ —>■ K such that the error that any s-differentially private 
mechanism approximating F introduces is {S{F)/e), with 
probability arbitrarily close to 1. 

Proof In order to prove Theorem 8, we consider X C [0,1]^ 
to be a finite set and without loss of generality we view the 
database V as an element of X'^ or as an element of 
i.e., a histogram over the elements of X, interchangeably. We 
can then make use of a general result provided by De (2012). 

Proposition 9 (De 2012). Assume I?i, I?2, • ■ •,2^2= G 
such that, for every i, ||22i||i < n and, for i ^ j, ||I?i — 
A. Moreover, let f: —>■ M* be such that for any 

i ^ P Wfi'Di) - f('Dj)\\ao > V- Iff^ <{s- l)/e, then any 
mechanism which is e-differentially private for the query f 
on databases of size n introduces an error which is 12 (ry), 
with probability arbitrarily close to 1. 

Therefore, we only need to show that there exists a suit¬ 
able sequence of databases 'Di,T> 2 , ■ ■ ■ , 1 ) 2 ^, a function 
F: Af" X 3^ —K and a. y G y such that Ff,y) satisfies 
the assumptions of Proposition 9. We actually show that this 
holds for every y G y. Let e > 0 and V he a non-negative 
integer. We define A” = ({0,1/(1^ + 8), 2/(l^-|-8),..., 1})^. 
Note that iV = jA"! = (1^ -h 9)^. Let furthermore c = [1/eJ 
and n = V c. The function F: Af" x [0,1]^ —?> M we 
consider is defined as follows: 

2 ^( 22 , y) = 'r]{dQ-\-- ■ ■+dN-z+2di\f-Q + .. .+8d]s[ + {y, 1 )), 

where di corresponds to the number of entries in V whose 
value is Xi, for every Xi G X. For s = 3, we consider 
the sequence of databases T>i,'D 2 , ■ ■ ■ ,T>g, where, for j G 
{1,2,..., 8}, di G Vj is such that 

ri, fori e (0,1 ,..., y- 1} 
di = Ic, for i = A — j A- 8 
[O, otherwise 

We first observe that, for every j G {1, 2,..., 8}, ||I2j ||i = n. 
Moreover, for i j, ||I?i — 22j||i = 2c < 2/e. Finally, for 
* 7 ^ \F{T^z,y) - F{'Fpy)\ > cy for every y G [0,1]^ 

Since S{F) = Try, Proposition 9 implies that, with high prob¬ 
ability, any e-differentially private mechanism approximating 
F must introduce an error of order 12 {S{F)/e). □ 

6 Examples 

In this section, we demonstrate the versatility of the Bern¬ 
stein mechanism through the analysis of a range of example 
learners. 

Kernel Density Estimation. Let X = y = [0, 1]^ and 

V — {di, d 2 ,..., dn) G Af". For a given kernel K^, with 
bandwidth H (a symmetric and positive definite £x£ matrix), 
the kernel density estimator Fh ■ Af" x A —K is defined 
as Fh{V, y) = ^ ELi ^Hiy - di). It is easy to see that 
5'(Fjr) < supj^g[_i KH{y)/n. For instance, if Kh is the 
Gaussian kernel with covariance matrix FI, then S{Fh) < 
1/iriy/ (27r)^ det(iT)). Moreover, observe that Fh{T>, •) is 



Figure 1: Private KDE with Gaussian kernel 


an (h, T)-smooth function for any positive integer h. Hence 
the error introduced by the mechanism is 

with probability at least 1 — jd.lvt Figure 1 we display the util¬ 
ity (averaged over 1000 repeats) of the Bernstein mechanism 
(fc = 20) on 5000 points drawn from a mixture of two normal 
distributions A(0.5,0.02) and A(0.75,0.005) with weights 
0.4,0.6, respectively. We first observe that for every privacy 
budget e there is a suitable choice of h such that our mecha¬ 
nism always achieves better utility compared to the baseline 
(c/ Section 4.2). Moreover, accuracy improves for increasing 
h, except for sufficiently large perturbations (small e) which 
more significantly affect higher-order basis functions (larger 
h). Private cross validation (Chaudhuri, Monteleoni, and Sar- 
wate 2011; Chaudhuri and Vinterbo 2013) can be used to 
tune h. We conclude noting that the same error bounds can 
be provided by the mechanism of Wang et al. (2013), since 
the function (22, •) is separable in the training set 22, i.e., 
Fni/D, ■) = 'Y/dev fnid, •). However, this assumption is 
overly restrictive for many applications. In the following, we 
discuss how the Bernstein mechanism can be successfully 
applied to several such cases. 


Priestley-Chao Kernel Regression. For ease of exposi¬ 
tion, consider £ = 1. For constant B > 0, let X = [0,1] x 
[—B,B] and y = [0,1]. Without loss of generality, con¬ 
sider datasets 22 = {(di,li),{d 2 ,l 2 ),- ■ ■ ,idn,ln)) G A”", 
where di < d 2 < ... < dn, and for every i G (1,..., n} 
there exists j f i such that \di — dj\ < c/n, for a given 
(and publicly known) 0 < c = o(n). Small values of c re¬ 
strict the data space under consideration, whereas c = n 
would correspond to the general case 22 G X". For kernel 
K and bandwidth b > 0, the Priestley-Chao kernel estima¬ 
tor (Priestley and Chao 1972; Benedetti 1977) is defined 
as Fb{T>,y) = lYi= 2 idi - di-i)K {{y - di)/b) k. This 
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Private vs Non-private SVM 
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Figure 2; Private SVM with Gaussian kernel 


function is not separable in V and 


S{F,) 


yey 


ABc 

nb 


sup K 
y6[-i.i] 



If K is the Gaussian kernel, then with probability at least 
1 — /3 the error introduced by the mechanism can be bounded 
by 


O 




h 

1 + h 


Naive Bayes Classification. In this example we apply the 
Bernstein mechanism to a probabilistic learner. Without loss 
of generality, assume X = [0,1]^, X = X x y = 

X and'D= {{di,li),{d 2 ,l 2 ), ■ ■ ■ ,{dn,ln)) G A'”. A naive 
Bayes classifier can be interpreted as F: A”” x —>■ K such 
that Fv{y) oc F{y\l+,V)¥{l+\V) - P(y|r,I?)P(r |I?). 
Predictions can then be made by assigning the instance y 
to the class (resp. l~) if Fx>{y) > 0 (resp. Fx>{y) < 0). 
Since, for a class I, P(y|Z,I?) c>c ni=i it is easy 

to show that Px>( ) is an {h, T)- smooth function whenever 
each likelihood is estimated using a Gaussian distribution or 
KDE (John and Langley 1995) (with a sufficiently smooth 
kernel). In the latter case, using a Gaussian kernel, the sen¬ 
sitivity of F can be bounded by S{F) < 2(l/n + (2^ — 
l)/(n-\/27r6)), where b is the chosen bandwidth. The detailed 
computation is provided in Appendix F. The error introduced 
by the Bernstein mechanism is thus bounded by 

/ 1 \ehi 

with probability at least 1 — /3. 

Regularized Empirical Risk Minimization. In the next 
examples, the functions we aim to release are implicitly 
defined by an algorithm. Let X — [0, = X x 

[0,1] and y = X. Let L be a convex and locally M- 
Lipschitz (in the first argument) loss function. For T) = 


{{di,li), {d 2 , h), ■ ■ ■, {dn, In)) G A””, a regularized empiri¬ 
cal risk minimization program with loss function L is defined 
as 

C . ” , 1 

w* e argmin —-f -||m||2 , (5) 

iuGR'- ^ 2 

where fw{x) = {(l){x),w) for a chosen feature mapping 
(fi: X —> M’’ taking points from X to some (possibly in¬ 
finite) r-dimensional feature space and a hyperplane nor¬ 
mal w S K’". Let K{x,y) = {(j){x), (j)[y)) be the ker¬ 
nel function induced by the feature mapping (j). The Rep¬ 
resenter Theorem (Kimeldorf and Wahba 1971) implies 
that the minimizer w* lies in the span of the functions 
K{-,di) G H, where Tf is a reproducing kernel Hilbert 
space (RKHS). Therefore, we consider F: A"" x 3^ — K 
such that Fx)(y) = ftv*{y) = for some 

ai G K. An upper bound on the sensitivity of this function 
follows from an argument provided by Hall, Rinaldo, and 
Wasserman (2013) based on a technique of Bousquet and 
Elisseeff (2002). In particular, we have 

S{F)= sup lUiy)-U'iy)]^^^ sup K{y,y). 
y^y ,'W'^u}' ^ y^y 

If K is (2/i, r)-smooth, the error introduced is bounded, with 
probability at least 1 — /3, by 

MC supyK{y,y) 
ne 

Note that this result holds with very mild assumptions, 
namely for any convex and locally M-Lipschitz loss func¬ 
tion (e.g., square-loss, log-loss, hinge-loss) and any bounded 
kernel K. Figure 2 depicts SVM learning with RBF kernel 
(C = CT = 1) on 1500 each of positive (negative) Gaus¬ 
sian data with mean [0.3,0.5] ([0.6,0.4]) and covariance 
[0.01, 0; 0, 0.01] (0.01 * [1, 0.8; 0.8,1.5]) and demonstrates 
the mechanism’s uniform approximation of predictions, best 
seen geometrically with the classifier’s decision boundary. 

Logistic Regression. Let now X = {x G [0,1]^: lla;jj2 < 
1}. Let furthermore A’ = x [0,1] and y = [0,1]^. The 
logistic regressor can be seen as a function F : Af" x 3^ —K 
such that Fxi{y) — {w*,y), where w* is the minimizer 
of (5) when (j) is the identity mapping and the loss function 
is L{1, {w, d)) = log (l -I- It is then possible to 

show that the error introduced by the Bernstein mechanism 
is bounded, with probability at least 1 — /3, by 

since Fp (y) is a linear function. The prediction with the 
sigmoid function achieves the same error bound, since it 
is 1/4-Lipschitz. A more detailed analysis is provided in 
Appendix G. 



log(l//3) 


7 Conclusions 

In this paper we have considered the release of functions of 
test data and privacy-sensitive training data. We have pre¬ 
sented a simple yet effective mechanism for this general 








setting, that makes use of iterated Bernstein polynomials to 
approximate any regular function with perturbations applied 
to the resulting coefficients. Both e-differential privacy and 
utility rates are proved in general for the mechanism, with 
corresponding lower bounds provided. A number of example 
learners are analyzed, demonstrating the Bernstein mecha¬ 
nism’s versatility. 
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A Fixed Points of the Bernstein Operator 

Although this is a classical result, we show for completeness 
that linear functions are fixed points of the Bernstein operator 

Bk = bI^\ for k > 1. Let f{y) = my + q, for m, g G M 
and y G [0,1]. We have 

k 

Bk{f]y) = '^f (z) K,k{y) 

k k 

= ^^•'Ay) + 7 XI Ak{y) 

i /=0 1^=0 

= my+ q , 

since = 1 ^nd ^’KAv) = ky. 


B Proof of Theorem 3 for i = 1 

Let us fix k, a positive integer. As described in Algorithm 1, 
the Bernstein mechanism perturbs the evaluation of the func¬ 
tion F-t) on a cover of the interval [ 0 , 1 ]. 

Lemma 10. Let e > 0. Then the Bernstein mechanism A4 
provides e-differential privacy. 


Proof Let V G A'" be a second database differing from T) 
in one entry only. Let furthermore if: Af" —> be the 

map defined by 

Then 


S{f)= sup \\f{V) - 

Vr^V 

idid 


u=0 


< S{F){k-kl) . 

According to Lemma 1 (applied with /c + 1 in place of d), the 
mechanism A4 provides e-differential privacy. □ 


In order to analyze the accuracy of our mechanism, we 
denote by B^^\FT,]y) = 

the iterated Bernstein polynomial of order h constructed 
using the coefficients output by the mechanism A4. The error 
a introduced by the mechanism can be expressed as follows: 


a = max 
!/e[o.i] 


FAv) - Bf\Fv-,y) 


(6) 


< max 
!/e[o.i] 


Bt\F^;y) 


Bt\F^-,y) 


(7) 


-f max 
ye[o.i] 


Fviy) 


Bt\FT>-.v) 


For every y G [0,1], the first summand in Equation (7) 
consists of the absolute value of an affine combination of 
independent Laplace-distributed random variables. 

Proposition 11. Let Zq, ... ,Zk Lap(A), t > 0, and 
Ch be a constant depending on h only. Then: 


max 

ye[o.i] 


E 


Ah‘'!:l{y) 


> T 


< g-r/(ChA) _ 


We provide the proof of Proposition 11 in Appendix C. 
Proposition 11 implies that with probability at least 1 — 
P the first summand in Equation (7) is bounded by 
O {S{F)k\ogil/P)/e). According to the regularity of Fxi, 
the second summand in Equation (7) can then be bounded by 
a decreasing function p(A:). All in all, the error in Equation ( 6 ) 
can be bounded as follows: 

o: = 0 (^g{k) + ( 8 ) 


Since the second summand in Equation ( 8 ) is an increasing 
function in k, the optimal value for k (up to a constant factor) 
is achieved when k satisfies 

g{k) = AAA log(l//3) . (9) 

e 

Solving Equation (9) with the bounds for p(fc) provided in 
Theorems 5 and 6 and substituting the thus obtained value 
of k into ( 8 ) prove the first two statements. The bound when 
Fj) is linear follows from the fact that for h = I and fc = 1 
the approximation error in Equation (7) is zero. The error is 
thus bounded by O {S{F) log{l/P)/e). The running time of 
the mechanism and the running time for answering a query 
is linear in k and hence upper bounded by a polynomial in n 
and 1/e, if 1/S'(F') < poly(n). 


C Proof of Proposition 11 

In order to prove the proposition, we make use of the follow¬ 
ing result. 

Theorem 12 (Proschan 1965). Suppose that / : M — [0,1] 
is a log-concave density function such that/(y) = fi—y)for 
every y G M. Let Zi,..., Zm be i.i.d. random variables with 
density f, and suppose that (oi,..., Um), (6 i,..., bm) G 
[ 0 , 1 ]™ satisfy 

(i) ai> a 2 > ■ ■■> am, &i > &2 > ■ • ■ > bmi 

(A < EiLi aifork = l,...,m-l; 

(Hi) eZi<^^ = j:t=a = ^- 

Then, for all t >Q 


p 

m 

^ A 

< P 

m 

A^aiZi 

> A 

. 

. i^l 

_ 

. 


_ 


Choosing ai = 1 and Oj = 0 for j = 2,... ,m. Theorem 12 
implies 






> r 


<P[|^i|>t] (10) 



for every (5i,..., bm) € [0,1]™ which satisfies = 

1. We then observe that the density fnnction h(y) = 
exp(—|j/|/A)/(2A) of the Laplace distribution is symmetric 
and log-concave. If Zi ~ Lap(A) are i.i.d. random variables 
for i = 1,... ,m, the right-hand side of Equation (10) satis¬ 
fies 

P[|^i| >t] =exp(-^) . (11) 

Although the bases are not always positive for h>2, 
we observe that, for y G [ 0 , 1 ], L (y) = Y!1 =o ^^d 

^'{y) = \ same distribution, since 

the random variables Z^, are i.i.d. and symmetric around zero. 
We can thus restrict our analysis to V{y). For y G [0,1], let 
U{y) = E ^=0 We first note that 

k h 

Uiy) = t^ '^(’"){-ir-^Bl-\b,,k;y) 

^=0 i=i VV 

= EE(>r^(v.;.) 

= EQE^E(6.A;y) 



= 2 ^ - 1 . ( 12 ) 

According to Equations (10) and (11), for every y G [0,1] 
and r' > 0 we have 

Choosing r = U{y)T', we get 

for every y G [0,1], concluding the proof. 


D Proof of Proposition 4 

The proof of the proposition follows from the same argu¬ 
ment provided in Appendix C, with some minor changes. In 
particular, it suffices to provide a tail bound for 

t k I 

j — 1 I'j—O i=l 



max 

y&[0,lV 


since, as observed in Appendix C, the random variables Zi, 
are i.i.d. and symmetric around zero. In order to apply Theo¬ 
rem 12 and conclude the proof, we need to upper bound 

t^(y) = EE 

j—1 Vj—0 i—1 

for every y G [0,1]^. We have 

^(y) = E E 

j — 1 Uj—O i—1 

EEniewi') EiCtoi) 

j—2 I'j—O i—2 J 1^1—0 

< ( 2 '* - lY , 

since, according to Equation (12), 

k 

E |<!fc(y.)|<( 2 "-i) 

for every j G The rest of the proof follows from 

the same computations done at the end of Appendix C. 

E Approximation Error of Multivariate 
Bernstein Polynomials 

In what follows, we assume that /: [0,1]^ —)■ M is a (7, L)- 
Holder continuous function. The proof for (h,T)-smooth 
functions follows the same argument, with minor changes. 
The argument we present here is by induction on i. The 
base case {£ = 1) follows from the fact that the Bernstein 
polynomial Bk{f; yi) converge uniformly to / in the interval 
[0,1], as shown in Theorem 6 . Assume now 

/ ^ \ 7/2 

\Bkif-,y)-fiy)\<£Li^-j , 



for every y G [0,1]^. Let /: [0,1]^+^ —> M be a 

(7,L)-H61der continuous function and let Bk{f\y) be 
the corresponding Bernstein polynomial. For every y = 

(jji,... ,yt+i) G [0, l]^+\let 


G(/;y) = EE/(x>-> 


n^Sfc(y*)' 


i=i 


The error |i?fc(/; y) — f{y)\ can then be bounded by 

< \Bkif; y) - Gif; y)\ + |G(/; y) - /(y)| (13) 

/ 1 y/2 / ^ y/2 

<L —] +£L —] 




In fact, the second term of Equation (13) is the error of the 
Bernstein polynomial of / seen as a function of t/i,..., 




only. The corresponding bound then follows from the induc¬ 
tive step. On the other hand, the first summand corresponds 
to the approximation error of the (univariate) Bernstein poly¬ 
nomial of G{f,y) as a function of the remaining variable 
Ui+i- The statement for {h, T)-smooth functions is similarly 
obtained by replacing Bk with and using the bound of 
Theorem 5. 

F Naive Bayes Classification 

In this section, we show how to bound the sensitivity of a 
naive Bayes classifier Fp, as defined in Section 6. 

SiF)= sup \FiV,y)-FiV',y)\ 

yelO,lV,V~V' 

<2 sup |P(y|(,X>)P(;|X>) 

1&{1+ ,1-} ,y&lO,lY ,V~V' 

-F{y\l,V')Fil\V')\ . 

We assume that a class probability F{l\T>) is estimated 
using the corresponding relative frequency in the training 
set T). Therefore, for T) ^ T)', F{l\T)) < P((|I?') + 1/n. 
Assume now that for every y G [0, Vf, T> ^ V G <T", 
I G {l^, l~} and i G {1,... ,£} there exists 0 < ^ < 1 such 
that 

\F{yS,F)-F{yS,V')\<^ 

holds. We then have 


i 

F{y\l,V) oc J|P(t/j|;,X>) 

i 

<l[{Fiy,\l,V')+0 

i 

<l[F{y,\l,V') + {2^-l)^, 

i=l 


where the last inequality follows from the fact that there 
are 2^ — 1 cross products and each one of them has at least 
a £ factor. If each (unidimensional) likelihood is estimated 
using KDE (John and Langley 1995) with a Gaussian kernel 
of bandwidth b, ^ corresponds to the upper bound on the 
sensitivity of KDE shown in Section 6. Putting all the pieces 
together, we obtain 


2 ^ — 1 

S{F)<2 sup P(Z|D') 

i&{i+,i-},y&[o,iV,v~v' nv27r5 

n 

2^-1 1 
n\/^b n 



G Logistic Regression 

Let X = {x G [0,1]^: ||a :||2 < 1}- Let furthermore 
X = AT X [0,1] and y = [0, Vf. The logistic regressor 
can be seen as a function F : x 3^ —> [0,1] such that. 


forD = ((di,(i),(d2,(2),...,K,(„)) G A”", Fv{y) = 
1/(1 -f exp(—(tn*, y))), where w* is such that 

w* G argmin — log fl -f -\- xlltrtlln . 

In order to compute S{F), we first observe that the sig¬ 
moid function is 1/4-Lipschitz. Denoting by tn ~ u?' the 
minimizers obtained from input databases T) ~ V, we have 




1 

- w',y)\ 

5(F) < 

sup 

Tl(«t 


y^y ,W'^w' 

4 




1 ,, 


< 

sup 

j\\w 

-w'hWy 


y^y 

4 



where the last inequality follows from an application 
of the Cauchy-Schwarz inequality. Chaudhuri and Mon- 
teleoni (2008) showed sup.„,„,.„,/ Ijin — ut '||2 < 2C/n. Since 
\\y \\2 < for every y G y, we have S{F) < C\/il{2.n). 
Since Fp is an (ft,, T)-smooth function for any positive inte¬ 
ger ft, with probability at least 1 — /3 the error introduced by 
the mechanism is bounded by 

/ C 

O - log(l//3) 

\ne J 

We note that defining Fxi{y) = {w*, y) the previous bound 
can be improved to 

since S{F) < 2C\/iln and Fx){y) is a linear function. The 
prediction with the sigmoid function achieves the same error 
bound, being 1 /4-Lipschitz. 



